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1 Introduction 

Our work addresses the generation of software manu- 
als in French and Enghsh, starting ft'om a semantic 



model of the task to be documented (Paris et al 
1995). Our prime concern is to be able to exercise 



control over the mapping from the task model to the 
generated text. We set out to establish whether the 
task model alone is sufficient to control the linguis- 
tic output of a text generation system, or whether 
additional control is required. In this event, an ob- 
vious source to explore is the communicative pur- 
pose of the author, which is not necessarily constant 
throughout a manual. Indeed, in a typical software 
manual, it is possible to distinguish at least three 
sections, each with a different purpose: a tutorial 
containing exercises for new users, a series of step- 
by-stcp instructions for the major tasks to be ac- 
complished, and a ready-reference summary of the 
commands. 

We need, therefore, to characterise the linguis- 
tic expressions of the different elements of the task 
model, and to establish whether these expressions 
are sensitive or not to their context, that is, the 
functional section in which they appear. This pa- 
per presents the results of an analysis we conducted 
to this end on a corpus of software instructions in 
French. 

2 Methodology 

The methodology we employed is similar to that en- 
dorsed by ( Biber, 1995| ). It is summarised as follows: 
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1. Collect the texts and note their situational char- 
acteristics. We consider two such character- 
istics: task structure and communicative pur- 
pose. 

2. Identify the range of linguistic features to be 
included in the analysis; 

3. Code the corpus in terms of the selected fea- 
tures; 

4. Compute the frequency count of each linguistic 
feature; 

5. Identify co-occurrences between linguistic fea- 
tures and the situational characteristics under 
consideration. 

We first carried out a classical sublanguage analy- 
sis on our corpus as a whole, without differentiating 



between any of the situational characteristics (Hart 
ley and Paris, 1995). This initial description was 



necessary to give us a clear statement of the lin- 
guistic potential required of our text generator, to 
which we could relate any restrictions on language 
imposed by situational variables. Thus we can ac- 
count for language restrictions by appealing to gen- 
eral discourse principles, in keeping with the recom- 



mendations of (Kittrcdge, 1995) and (Biber, 1995) 
for the definition of sublanguages. 

We then correlated task elements with grammati- 
cal features. Finally, where linguistic realisation was 
under-determined by task structure alone, we estab- 
lished whether the communicative purpose provided 
more discriminating control over the linguistic re- 
sources available. 

3 Linguistic Framework: Systemic 
Functional Linguistics 

Our analysis was carried out within the framework 
|of Systemic-Functional Linguistics (sfl) "( Halliday, 



1978; Halliday, 1985) which views language as a re- 
source for the creation of meaning. SFL stratifies 
meaning into context and language. The strata of 
the linguistic resources are organised into networks 
of choices, each choice resulting in a different mean- 
ing realised (i.e., expressed) by appropriate struc- 
tures. The emphasis is on paradigmatic choices, as 
opposed to syntagmatic structures. Choices made in 
each stratum constrain the choices available in the 
stratum beneath. Context thus constrains language. 

This framework was chosen for several reasons. 
First, the organisation of linguistic resources accord- 
ing to this principle is well-suited to natural lan- 
guage generation, where the starting point is nec- 
essarily a communicative goal, and the task is to 
find the most appropriate expression for the in- 
tended meaning ( Matthiessen and Bateman, 1991). 
Second, a functional perspective offers an advan- 
tage for multilingual text generation, because of its 
ability to achieve a level of linguistic description 
which holds across languages more effectively than 
do structurally-based accounts. The approach has 
been shown capable of supporting the sharing of lin- 
guistic resources between languages as structurally 



distinc t as English and Japan ese (Bateman et al 



1991a| ; pBateman ct al., 1991bD . It is therefore rea- 



sonable to expect that at least the same degree of 
commonality of description is achievable between 
English and French wit hin this framework. Finally, 
KPML ( Bateman, 1994 ), the tactical generator we 
employ, is based on SFL, and it is thus appropriate 
for us to characterise the corpus in terms immedi- 
ately applicable to our generator. 

4 Coding features 

Our lexico-grammatical coding was done using the 
networks and features of the Nigel grammar (Hal- 
liday, 1985). We focused on four main concerns. 



guided by previous work on instructional texts, e.g. 



(Lchrberger, 1986; 


Plum et al., 199C; Ghadessy, 


1993; 


Kosseim and Lapalme, 1994 


)■ 



• Relations between processes: to determine 
whether textual cohesion was achieved 
through conjunctives or through relations 
implicit in the task structure elements. 
Among the features considered were clause 
dependency and conjunction type. 

• Agency: to see whether the actor perform- 
ing or enabling a particular action is clearly 
identified, and whether the reader is explic- 
itly addressed. We coded here for features 
such as voice and agent types. 



• Mood, modality and polarity: to find out 
the extent to which actions are presented 
to the reader as being desirable, possible, 
mandatory, or prohibited. We coded for 
both true and implicit negatives, and for 
both personal and impersonal expressions 
of modality. 

• Process types: to see how the domain is 
construed in terms of actions on the part 
of the user and the software. We coded for 
sub-categories of material, mental, verbal 
and relational processes. 

5 The Corpus 

The analysis was conducted on the French version 



of the Macintosh Mac Write manual (Kaehler, 1983). 
The manual is derived from an English source by a 
process of adaptive translation ( ^ager, 1993 ), i.e., 
one which localises the text to the expectations of 
the target readership. The fact that the translation 
is adaptive rather than literal gives us confidence in 
using this manual for our analysis.^ Furthermore, 
we know that Macintosh documentation undergoes 
thorough local quality control. It certainly conforms 
to the principles of good documentation established 
by current research on technical documentation and 
on the needs of end-users, e.g., ( [Carroll, 199^ ; |Ham-| 
mond, 1994), in that it supplies clear and concise 



information for the task at hand. Finally, we have 
been assured by French users of the software that 
they consider this particular manual to be well writ- 
ten and to bear no unnatural trace of its origins. 

Technical manuals within a specific domain con- 
stitute a su blanguage, e.g., ( Kittrcdge, 1982 ; Sager 
|et al., 1980 ). An important defining property of a 
sublanguage is that of closure, both lexical and syn- 
tactic. Lexical closure has been demonstrated by, for 



example, (Kittredge, 1987), who shows that after as 
few as the first 2000 words of a sublanguage text, 
the number of new word types increases little if at 
all. Other work, e.g., (|Biber, 1988| ; [Biber, 19891 ) and 
( prishman and Kittredge, 1986| ) illustrates the prop- 
erty of syntactic closure, which means that generally 
available constructions just do not occur in this or 

^We would have preferred to use a manual which orig- 
inated in French to exclude all possibility of interfer- 
ence from a source language, but this proved impossi- 
ble. Surprisingly, it appears that large French compa- 
nies often have their documents authored in English by 
francophones and subsequently translated into French. 
One large French software house that we contacted does 
author its documentation in French, but had registered 
considerable customer dissatisfaction with its quality. 
We decided, therefore, that their material would be un- 
suitable for our purposes. 



Goals: La selection 

Gloss: Selection 

Pour selectionner un mot, (faites un double-clic sur le mot) 
Gloss: To select a word, (do a double-click on the word) 



Functions: (Fermer -) Get article permet de fermer une fenetre activee 

Gloss: (Close -) This command enables you to close the active window 



Constraints: Si vous donnez a votre document le titre d'un document deja existant, (une zone de 
dialogue apparait) 

Gloss: If you give your document the title of an existing document, (a dialog box 
appears) 



Results: (Choisissez Coller dans le menu Edition - ) Une copie du contcnu du presse-papiers apparait 
Gloss: (Choose Paste from the Edit menu -) A copy of the content of the clipboard appears 

Substeps: Fermez la fenetre Rechercher 
Gloss: Close the Find window 

Ensuite, on ouvre le document de destination 
Gloss: Next, one opens the target document 



Figure 1: Examples of task element expressions 



that sublanguage. In the light of these results, we 
considered a corpus of 15000 words to be adequate 
for our purposes, at least for an initial analysis. 

The MacWrite manual is organised into three 
chapters, corresponding to the three different sec- 
tions identified earlier: a tutorial, a series of step- 



by-step instructions for the major word-processing 
tasks, and a ready-reference summary of the com- 
mands. We omitted the tutorial because the gen- 
eration of such text is not our concern, retaining 
the other two chapters which provide the user with 
generic instructions for performing relevant tasks, 
and descriptions of the commands available within 
MacWrite. The overlap in information between the 
two chapters offers opportunities to observe differ- 
ences in the linguistic expressions of the same task 
structure elements in different contexts. 



to be the expression of a single task element. 

Our definition of the task elements is based on the 
concepts and relations commonly chosen to repre- 
sent a task structure (a goal an d its associated plan) , 
e.g., ( Fikes and Nilsson, 1971 ; Bacerdoti, 1977), and 



on related research, e.g., (Kosseim and Lapalme 
1994). Our generator produces instructions from an 



underlying semantic knowledge base which uses this 
representation (Paris ct al., 1995). To generate an 
instruction for performing a task is to chose some 
task elements to be expressed and linearise them so 
that they form a coherent set for a given goal the 
user might have. We distinguish the following ele- 
ments, and provide examples of them in Figure 

goals: actions that users will adopt as goals and 
which motivate the use of a plan. 



6 Task Structure 

Task structure is constituted by five types of task 
elements, which we define below. We used the no- 
tion of task structure element both as a contextual 
feature for the analysis and to determine the seg- 
mentation of the text into units. Each unit is taken 



functions: actions that represent the functionality 
of an interface object (such as a menu item). A 
function is closely related to a goal, in that it is 
also an action that the user may want to per- 

^The text in parentheses in the Figure is part of the 
linguistic context of the task element rather than the 
element itself. 



form. However, the function is accessed through 
the interface object, and not through a plan. 

constraints and preconditions: 

states which must hold before a plan can be 
employed successfully. The domain model dis- 
tinguishes constraints (states which cannot be 
achieved through planning) and preconditions 
(states which can be achieved through plan- 
ning). We do not make this distinction in 
the linguistic analysis and regroup these related 
task structure elements under one label. We 
decided to proceed in this way to determine at 
first how constraints in general are expressed. 
Moreover, it is not always clear from the text 
which type of constraint is expressed. Drawing 
too fine distinctions in the corpus analysis at 
this point, in the absence of a test for assigning 
a unit to one of these constraint types, would 
have rendered the results of the analysis more 
subjective and thus less reliable. 

results: states which arise as planned or unplanned 
effects of carrying out a plan. While it might 
be important to separate planned and un- 
planned effects in the underlying representa- 
tion, we again abstract over them in the lexico- 
grammatical coding. 

sub-steps: actions which contribute to the execu- 
tion of the plan. If the sub-steps are not prim- 
itive, they can themselves be achieved through 
other plans. 

7 The Coding Procedure 

No tools exist to automate a functional analysis of 
text, which makes coding a large body of text a time- 
consuming task. We first performed a detailed cod- 
ing of units of texts on approximately 25% of the 
corpus, or about 400 units,^ using the WAG coder 
(O'Donnell, 1995), a tool designed to facilitate a 
functional analysis. 

We then used a public-domain concordance pro- 
gram, MonoConc ( Barlow, 1994 ), to verify the rep- 
resentativeness of the results. We enumerated the 
realisations of those features that the first analysis 
had shown as marked, and produced KWICQ list- 
ings for each set of realisations. We found that the 
second analysis corroborated the results of the first, 
consistent with the nature of sublanguages. 



8 Distribution of Grammatical 

Features over Task Structure and 
Communicative Purpose 

We examined the correlations between lexico- 
grammatical realisations and task elements and com- 
municative purpose. The results are best expressed 
using tables generated by WAG: given any system, 
WAG splits the codings into a number of sets, one for 
each feature in that system. Percentages and means 
are computed, and the sets are compared statisti- 
cally, using the standard T-test. WAG displays the 
results with an indicator of how statistically signifi- 
cant a value is compared to the combined means in 
the other sets. The counts were all done using the 
local mean, that is, the feature count is divided by 
the total number of codings which select that fea- 
ture's system. Full definitions of the features can be 
found in (Halliday, 1985; Bateman et al., 1990). 

In some cases, the type of task element is on its 
own sufficient to determine, or at least strongly con- 
strain, its linguistic realisation. The limited space 
available here allows us to provide only a small num- 
ber of examples, shown in Figure ||. We see that the 
use of modals is excluded in the expression of func- 
tion, result and constraint, whereas goal and sub- 
step do admit modals. As far as the polarity sys- 
tem is concerned, negation is effectively ruled out 
for function, goal and substep. Finally, with respect 
to the mood system, only substep can be realised 
through imperatives. 

In other cases, however, we observe a diversity of 
realisations. We highlight here three cases: modality 
in goal, polarity in constraint, and mood in substep. 
In such cases, we must appeal to another source of 
control over the apparently available choices. We 



■^The authors followed guidelines for identifying task 
element units which had yielded consistent results when 
used by students coding other corpora. 

''Key Word In Context 



have looked to the construct of genre (Martin, 1992) 
to provide this additional control, on two grounds: 
(1) since genres are distinguished by their commu- 
nicative purposes, we can view each of the functional 
sections already identified as a distinct genre; (2) 
genre is presented as controlling text structure and 
realisation. In Martin's view, genre is defined as a 
staged, goal-oriented social process realised through 
register, the context of situation, which in turn is 
realised in language to achieve the goals of a text. 
Genre is responsible for the selection of a text struc- 
ture in terms of task elements. As part of the re- 
alisation process, generic choices preselect a register 
associated with particular elements of text structure, 
which in turn preselect lexico-grammatical features. 
The coding of our text in terms genre and task el- 
ements thus allows us to establish the role played 
by genre in the realisations of the task elements. It 
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Figure 2: 


Selective realisations of task elements 





Ready-Reference Procedure Elaboration 



Sub-step 
Goal 

Constraint 

Result 

Function 
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0% 
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14% 
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Figure 3: Distribution of task structure elements over genres 



will also allow us to determine the text structures 
appropriate in each genre, a study we are currently 
undertaking. This is consistent with other accounts 
of text structure for text generation in technical do- 
mams, e.g., ( [McKeown, 1985t [Paris, 1993| ; |Kittredge| 
et al., 1991| ). 

For those cases where the realisation remains 
under-determined by the task element type, we con- 
ducted a finer-grained analysis, by overlaying a genre 
partition on the undifferentiated data. We distin- 
guished earlier two genres with which we are con- 
cerned: ready-reference and step-by-step. In the 
manual analysed, we recognised two more specific 
communicative purposes in the step-by-step section: 
to enable the reader to perform a task, and to in- 
crease the reader's knowledge about the task, the 
way to achieve it, or the properties of the system 
as a whole. Because of their distinct communica- 
tive purposes, we again feel justified in calling these 
genres. We label them respectively procedure and 
elaboration. The intention that the reader should 
recognise the differences in function of each section 
is underscored by the use of distinctive typographi- 
cal devices, such as fonts and lay-out.|^ 



^See ( Hartley and Paris, 1995| ) for examples extracted 
from the manuals. 



The first step at this stage of the analysis was to 
establish whether there was an effective overlap in 
task elements among the three genres under consid- 
eration. The results of this step is shown in Figure ^. 
Sub-step and goal are found in all three genres, while 
constraint, result and function occur in both ready- 
reference and elaboration but are absent from pro- 
cedure. 

The next step was to undertake a comparative 
analysis of the lexico-grammatical features found in 
the three genres. This analysis indicated that the 
language employed in these different sections of the 
text varies greatly. We summarise here the two 
genres that are strongly contrasted: procedure and 
ready-reference. Elaboration shares features with 
both of these. 

procedure: The top-level goal of the user is ex- 
pressed as a nominalisation. Actions to be 
achieved by the reader are almost exclusively 
realised by imperatives, directly addressing the 
reader. These actions are mostly material di- 
rected actions, and there are no causatives. Few 
modals are employed, and, when they are, it is 
to express obligation impersonally. The polar- 
ity of processes is always positive. Procedure 
employs mostly independent clauses, and, when 



Procedure Ready-Reference Elaboration 

Non-modal 100.0% 75.0% 72.6% 

Modal 0.0% 25.0% 28.4% 

Figure 4: Genre-related differences in the modal system for goal 

Ready-Reference Elaboration 

Negative 0.0% 41.7% 
Positive 100% 58.3% 

Figure 5: Genre-related differences in the polarity system for constraint 

Procedure Ready-Reference Elaboration 

Imperative 97.3% 44.4% 77.6% 

Declarative 2.7% 55.6% 22.4% 

Figure 6: Genre-related differences in the mood system for substep 



clause complexes are used, the conjunctions are 
mostly purpose (linking a user goal and an ac- 
tion) and alternative (linking two user actions 
or two goals). 

ready-reference: In this genre, all task elements 
are always realised through clauses. The declar- 
ative mood predominates, with few impera- 
tives addressing the reader. Virtually all the 
causatives occur here. On the dimension of 
modality, the emphasis is on personal possi- 
bility, rather than obligation, and on inclina- 
tion. We find in this genre most of the ver- 
bal processes, entirely absent from procedure. 
Ready-reference is more weighted than proce- 
dure towards dependent clauses, and is partic- 
ularly marked by the presence of temporal con- 
junctions. 

The analysis so far demonstrates that genre, like 
task structure, provides some measure of control 
over the linguistic resources but that neither of these 
alone is sufficient to drive a generation system. The 
final step was therefore to look at the realisations 
of the task elements differentiated by genre, in cases 
where the realisation was not strongly determined 
by the task element. 

We refer the reader back to Figure ||, and the 
under-constrained cases of modality in goal, polar- 
ity in constraint, and mood in substep. Figure ^ 
shows the realisations the task element goal with re- 
spect to the modal system, which brings into sharp 



relief the absence of modality from procedure. Fig- 
ure P presents the realisations by genre of the po- 
larity system for constraint. We observe that only 
positive polarity occurs in ready-reference. Finally, 
we note from Figure ^ that the realisation of sub- 
steps is heavily loaded in favour of imperatives in 
procedure. 

These figures show that genre does indeed provide 
useful additional control over the expression of task 
elements, which can be exploited by a text genera- 
tion system. Neither task structure nor genre alone 
is sufficient to provide this control, but, taken to- 
gether, they offer a real prospect of adequate control 
over the output of a text generator. 

9 Related Work 

The results from our linguistic analysis are con- 
sistent with other research on sublanguages in the 
instructions domain, in both French and English, 



e.g., ( Kosseim and Lapalme, 1994 ; Paris and Scott 



1994). Our analysis goes beyond previous work by 



identifying within the discourse context the means 
for exercising explicit control over a text generator. 

An interesting difference with respect to previous 
descriptions is the use of the true (or direct) imper- 
ative to express an action in the proc edure genre, 
as results from ( Paris and Scott, 1994 ) seem to in- 
dicate that the infinitive-form of the imperative is 
preferred in French. These results, however, were 
obtained from a corpus of instructions mostly for 
domestic appliances as opposed to software manuals. 



Furthermore the use of the infinitive- form in instruc- 
tions in general as observed by ( Kocourck, 1982| ) is 
declining, as some of the conventions already com- 
mon in English technical writing are being adopted 
by French technical writers, e.g., (Timbal-Duclaux 
l990| ) 



We also note that the patterns of realisations un- 
covered in our analysis follow the principle of good 
technical writing practice known as the minimal- 
ist approach, e.g., (Carroll, 1994; Hammond, 1994). 



Moreover, we observe that our corpus does not ex- 
hibit shortcomings identified in a Systemic Func- 



tional analy sis of English software manuals (Plum 
et al., 199C ), such as a high incidence of agentlcss 



passive and a failure to distinguish the function of 
informing from that of instructing. 

Other work has focused on the cross-linguistic re- 
alisations of two specifi c semantic relations (gener- 
ation and enablement) ( Delin et al., 1994 ; Dclin et 



al., 1996 ), in a more general corpus of instructions 
for household appliances. Our work focuses on the 
single application domain of software instructions. 
However, it takes into consideration the whole task 
structure and looks at the realisation of semantic el- 
ements as found in the knowledge base, instead of 
two semantic relations not explicitly present in the 
underlying semantic model. 

10 Conclusion 

In this paper we have shown how genre and task 
structure provide two essential sources of control 
over the text generation process. Genre does so 
by constraining the selection of the task elements 
and the range of their expressions. These elements, 
which are the procedural representation of the user's 
tasks, constitute a layer of control which mediates 
between genre and text, but which, without genre, 
cannot control the grammar adequately. 

The work presented here is informing the devel- 
opment of our text generator by specifying the nec- 
essary coverage of the French grammar to be imple- 
mented, the required discourse structures, and the 
mechanisms needed to control them. We continue 
to explore further situational and contextual factors 
which might allow a system to fully control its avail- 
able linguistic resources. 
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